首页> 外文OA文献 >An MDP model-based reinforcement learning approach for production station ramp-up optimization: Q-learning analysis
【2h】

An MDP model-based reinforcement learning approach for production station ramp-up optimization: Q-learning analysis

机译:基于MDP模型的强化学习方法,用于生产站的产能优化:Q学习分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Ramp-up is a significant bottleneck for the introduction\udof new or adapted manufacturing systems. The effort\udand time required to ramp-up a system is largely dependent on\udthe effectiveness of the human decision making process to select\udthe most promising sequence of actions to improve the system to\udthe required level of performance. Although existing work has\udidentified significant factors influencing the effectiveness of rampup,\udlittle has been done to support the decision making during\udthe process. This paper approaches ramp-up as a sequential\udadjustment and tuning process that aims to get a manufacturing\udsystem to a desirable performance in the fastest possible time.\udProduction stations and machines are the key resources in a\udmanufacturing system. They are often functionally decoupled\udand can be treated in the first instance as independent rampup\udproblems. Hence, this paper focuses on developing a Markov\uddecision process (MDP) model to formalize ramp-up of production\udstations and enable their formal analysis. The aim is to\udcapture the cause-and-effect relationships between an operator’s\udadaptation or adjustment of a station and the station’s response to\udimprove the effectiveness of the process. Reinforcement learning\udhas been identified as a promising approach to learn from rampup\udexperience and discover more successful decision-making\udpolicies. Batch learning in particular can perform well with little\uddata. This paper investigates the application of a Q-batch learning\udalgorithm combined with an MDP model of the ramp-up process.\udThe approach has been applied to a highly automated production\udstation where several ramp-up processes are carried out. The\udconvergence of the Q-learning algorithm has been analyzed\udalong with the variation of its parameters. Finally, the learned\udpolicy has been applied and compared against previous ramp-up\udcases.
机译:升级是引入新的或改装的制造系统的重大瓶颈。增强系统所需的精力和时间在很大程度上取决于人工决策过程选择/最有希望的行动序列以提高系统以使其达到所需性能水平的有效性。尽管现有工作已经确定了影响提升效率的重要因素,但是已经做了一些努力来支持过程中的决策。本文将加速作为一个顺序\调整和调整过程,旨在使制造\ ud系统在尽可能短的时间内达到理想的性能。\ ud生产工位和机器是\ ud制造系统中的关键资源。它们通常在功能上是解耦的\ ud,并且可以首先将其视为独立的加速\ ud问题。因此,本文着重于开发马尔可夫决策过程(MDP)模型,以正式化生产\停工量并对其进行形式化分析。目的是\了解运营商对站点的适应或调整与站点响应之间的因果关系,以\证明提高过程的有效性。强化学习已经被认为是一种有前途的学习方法,可以从加强学习,经验学习中发现更多成功的决策制定方法。批处理学习尤其可以在\ uddata很少的情况下表现良好。本文研究了Q批次学习\ udalgorithm结合斜率上升过程的MDP模型的应用。\ ud该方法已应用于高度自动化的生产\减数,其中执行了几种斜率上升过程。 Q学习算法的\收敛性已经随着其参数的变化进行了分析。最后,已经应用了学习\ udpolicy并将其与以前的升级\ udcase进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号